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Bias and Information of Bayesian Adaptive Testing 



^ • * ■ ' 

Since test scores are typically used to differentiate ^mong persons, one 
highly desirable property of a test wouW be that it treasure equally well at all 
points. Another consideration is that it measure each person precisely. Thus, 
an "ideal" test would have a high, horizontal information function. Unfortu- 
nately, this ideal cannot normally be achieved in "a f ixed*-length conventional 
test that draws its items from a much larger fixed pool of test items. Ordinar- 
ily, some trade off.s must be made. Relatively high information at a point can 
be achieved by "peaking" the test, that is, constructing it of the most discrim- 
inating items in a narrow range of difficulty. A relatively flat but low infor- 
mation function can be achieved by selecting equidiscriminati qg items having a 
wide range of item difficulty values. The only way to approximate a high, flat 
information function is to administer to each person the subset of "items that 
provides the most information at his/her level^of ability, Q. The problem with 
this is obvious: 0 is unknown before the test is administered. 

An adaptive test . can select items during the course of 'testing in such a 
way as to attempt to maximize the information obtained for each examinee. This 
may be done either by simple branching—administering a more difficult item af- 
ter a correct answer and an easier item after an incorrect answer—or by more 
elaborate techniques. Owen's (1969, 1975) Bayesian adaptive testing strategy 
estimates b after each item response, then selects the unused test item that is, 
in one sense, the most "informative" at the current estimated ability level. 
The result is that different persons take different sets of test items; each set 
of test items spans a range of difficulty levels approximately tailored to pro- 
vide maximal information about the individual examinee. * 

The information function of the test scores derived front any adaptive test- 
ing procedure should be (1) flatter than that of a peaked test of the * same 
length and constructed from the same item pool and (2) higher than that of a 
rectangular test of ttte same length drawn from the same item pool. The height 
of the adaptive test's information function will be determined in large part by 
the discriminations and guessing parameters of the constituent items of the item, 
pool as well as by test length. The flatness of the information curve (and to 
some extent its height) will depend largely on the range of item difficulties in 
the pool and on the effectiveness of the adaptive item selection procedure. 

Urry (1971) conducted monte carlo simulations of Owen 1 8 (1969, 1975) se- 
quential procedure using three different simulated item banks: two banks of * 
"ideal" item parameters and one bank of items with the same parameters as the 
VSAT (Lord, 1968). Urry's item Bank A had 20 equidiscriminating items (a - 1.6) 
-St each of five equally spaced levels on the ability continuum; his Item Bajik B 
employed five items of the same (£ ■ 1.6) discriminations at each of 20 ability 
levels; and I%em Bank C employed the parameters actually* occurring in the VSAT. 
Banks A and B required an average of just ovec 11 items to test termination. 
Bank C required an average of 27.5 items to termination. The other noteworthy 
result of Urry's (1971) simulation studies was the nuignitude of the fidelity 
coefficients, for simulated examinees drawn randomly from a normal (0,1) popu- 
lation, the observed correlations of .936 (Item Bank, A) and .919 (Item Bank B) ^ 
are quite high in view of the relatively short test lengths involved. 



Jensema (1972) simulated Ower^s (1969, 1975> approach to Bayesian testing 
using the actual item responses of 100 live examinees to 58 mathematics items 
drawn from four conventional pre-college tests taken at full length by the exam- 
inees. From a record of their item-by-item actual test performance^ a computer 
program 'constructed artificial protocols of their responses to the items that 
would have been administered by Bayesian sequential tests under two different 
conditions: with and without differential prior information about qpaminees 1 
abilities. Parallel to these two "real data" simulations, Jensema carried out 
monte carlo simulations of the Bayesian procedure. These simulations used 100 
simulated examinees and items with logistic ogive parameters identical to the 58 
real items. ^Item scores were generated as a stochastic function of ability, 8 , 
and the parameters of each item. The adaptive tests were terminated in each 
instance when the posterior variance of the Bayesianr ability estimate fell below 
.0625 or when 30 ite"M had been administered, whichever ^occurred first. 

In the real-data simulation, mean test length was about 27 items, with or 
without differential initial ability estimates. The Bayesian estimates corre- 
lated about .86 with scores on a weighted composite of the four conventional 
tests from which the item bank was selected. Jensema did not report a correla- 
tion of ability with test length or with precision .of estimate, but he di<J ob- 
serve that the posterior variance criterion terminated the testing* only iu^the* 
upper portions of the distribution of estimated ability. Jensema interpreted. 
thes"fc results to imply that the item pool was unsatisfactory for adaptive test- 
ing in the lower ability levels due to the low discriminations of the items ip 
that region of the difficulty continuum. HJ.S monte carlo results using ttifc same 
item pool resulted in virtually identical dean test lengths and in correlations 
of .92 between estimated ability and true ability. He concluded, in part, fhat 
a satisfactory item pool for adaptive testing needs to employ very highly dis- 
criminating items uniformly distributed on the difficulty continuum. 4 Anothejr 
conclusion he reached—this one on the basis of monte carlo simulation with ide- 
al item hanks — was that for most purposes littl* was to be gained by the use of 
prior information about examinees to determine a variable initial 9 estimate. 
Jensema found that using differential prior information resulted in an average 
savings of only one test item. 

In another monte carlo study of Owen's Baye^Lan strategy, Jensema (1974) 
examined the effects of item parameters and Bayesian test length oiji test reli- 
ability. He showed that reliability is directly related to the posterior vari- 
ance of the Bayesian ability estimate; hence, using a specific value of that 
posterior variance as a termination criterion determines the reliability of the 
test. Jensema showed that the average number of items required to attain that 
reliability varies as a function of the item parameters. With items Uniformly 
distributed on difficulty, the higher the item discrimination, the shorter the 
test. 

McBride (1977; McBride & Weiss, 1976) also studied characteristics of the 
ability estimates resulting from Owen's (1969, 1975) strategy. These monte 
carlo simulations involved (1) an ideal item pool with variable test length; (2) 
the effects of guessing and item discrimination in a perfect item pool; (3) the 
effects of fixed test length; and (4) the effects of ability level and item pool 
configuration. In the first three studies, the performance of the adaptive test 
was* evaluated on overall indices including the overall bias and mean absolute 



error of the ability estimates, the correlation of ability estimates with true 
ability estimates (fidelity), and correlations of true and estimated ability 
levels with errors and test length* 

The fourth study evaluated the performance of this testing strategy in a*j 
item pool with no correlation between difficulty and discrimination parameters, 
and uping items with high negative and high, positive correlations between these 
parameters. In contrast to the other studies, characteristics of the ability 
estimates were examined as a function of true 8; dependent, variables included 
bias and information conditional on 0» Contrasting with the first three stud*- ♦ 
ies, which showed little overall mean bias and information, Study 4 showed se- 
vere bias in the conditional 6 estimates for all three item pool configurations. 
Estimates of 6 were unbiased only for five 6 values between 0 * 1.0 to -^1.0; for 
low 6 values, 6 was : overestimated and high 6 values were underestimated. In 
addition, the information curves for the three item pool conf igurationp were not 
high and flat as would be expected, at least when the ideal : ,item pool was used 
in which difficulty and discrimination parameters were uncorrelated. 

> 

Gorman (1980) also examined the bias and information of scores produced by 
•Owen's Bayesian, testing procedure. -These analyses were based on two "ideal" 
item pools with discriminations of £ - .8 and 1.6, in which 101' items were rec- 
tangularly distributed in difficulty, and both true and estimated item parame- m 
ters were used. Gorman also studied the effect of applying a correction for 
regression (proposed by Urry, 1977) to ability estimates from I -en's testing 
procedure, designed to reduce bias in the estimates. His results show substan- 
tial bias in the uncorrected 6 estimates, with positive bias for 0 levels below 
zero, negative bias for 6 levels above zero, and higher levels of bias for the 
less discriminating items. His data also show that Urry 1 s correction was not 
entirely successful in eliminating the bias, since the corrected 0 estimates for 
6 levels above zero resulted in positive bias. Since Gorman's study used aft 
ideal, but finite, item pool, however, his resultsmay be partially item pool 
dependent. In addition, Gorman's study did hot' attempt to determine the cause 
of the bias in the 6 estimates but simply examined one possible approach to re- 
ducing it. 

Purpose ' 

The present study was designed to further investigate the nature of the 
bias and the information characteristics of Owen*s Bayesian adaptive testing 
'Strategy and to examine possible causes of the bias. Factors investigated in- 
cluded (1)° the effects of item discrimination, (2) the effects of fixed vs. 
variable test length, and (3) the effect of an accurate prior 0 estimate. 

Method 
i- **' 

Design 

Monte carlo simulation of Owen's adaptive test was used. Unlike some pre- 
vious simulation studies, but similar to Studies 1 to 3 in McBride (1977), the 
present studies did not: use a prestruc.tured , item pool. Rather, the tests were 
simulated using a perfect and infinite item pool having any difficulty parame- 
ters required by the ittem selection process, with restrictions only on the item 
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discriminations and pseudo-guess ing parameters, c. By thus simulating an infi- 
nite item pool, the results of the simulation studies should reveal, within the % 
limits of sampling error, the inherent properties .of the Bayesian adaptive test, 
unafffected by the idiosyncrasies of a typical finite item pool. ^ 

Similarly, following the procedures of Study 4 in McBride (197-7). in order 
to permit accurate description of the properties of the testing method as 'they 
vary with trait level/ the simulated examinees (simulees) were not drawn random- 
ly from a specified distribution; rather, a large number of examinees were simu- 
lated. at eachipf a number of trait levels throughout the normally encountered 
range. * 

Examinees 

Tor the purposes of moote carlo simulation, an examinee i was characterized 
by a numerical value, which is the actual trait level 6. In .each of the eight 
data £ets generated, there were 3,100 simulees, with 1,00 at each of 31 6 levels 
equally spaced in the interval -3.0 to 3.0. This range of the trait would in- 
clude 99.99% of a population normally distributed on 0, with mean 0 and variance 
1. ' 

Test Items 

• • 

For each separate item administration, an item was computer generated with 
the pseudo-guessing (c) parameter held constant at .20, simulating a f iv f e 1 -alter- 
native, multiple-choice item. The item discrimination, ji, was constant for each 
data set, with a « .80, 1.60, or 2.40 between data sets. « 

Following McBride (1977) the difficulty (b>) parameter for each simulated 
item administration was determined by the current 0 (the prior mean M^j. of the 

estimated distribution of 0£ before administering the mth itefO and by the con- 
stant item parameters ag and bg, according to the formula 

i + a 

[i] 



= M - - — - — jr log 
g m-1 1.7a k B 
g 
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Equation 1 gives the item difficulty value having, maximal information when 0^ ■ - 

Mft-lt ai ?d ag and Cg are fixed (Birnbaum, 1968, p. 464). Since, in general, 0^ is 

unknown and the best available estimate is the item difficulty chosen is 

the one that is the most informative, given the current' estimate of -9 at any 
point in the adaptive test. 

Item Responses 

The dichotomous (0,1 T score of any simiilee on any item is a probabilistic . 
function of its status 6^ on the trait 6, the item difficulty b a9 and the p&ram- 

* • o 

meters ag atTd Cg. The probability Pg(©i) of a correct response (ug * 1) under 
the logistic model item characteristic curve is 

p g ( v * c g + <1 " c g )/ l i + ex p[~ 1,7 V e i~y] } • • • 121 



Ifi ordex to simulate item responses, each time an item administration took 
place the quantity P^(8 i ),was compared wit;h a pseudo-random number r^ generat- 
ed- from a distribution uniform in the interval [0\if. A score of u g « 1 was 
assigned whenever Pg^)' equaled or ftxe^eded r gl ; otherwise, a score of 0 was 
assigned. , 

D ependent Variables ■ 4 ■ 

7 ' * * 

For the simulated test of each indiv^dtfal the following were- recorded : * 

k, the number^ of items administered^ a * j 

M u , the posterior mean after k items (i.e., 9); and 

V k , the posterior variance af ter k items (i.e., the variance of 6). 

these values were averaged -at each level of 6 across the 100 simulees at that 
level* resulting in 6^ the mean of the 6 estimates at each level of 6^1 «- 1, 
2, ;.;,-31), and a 2 (6 i ), the variance of 0 at each a level. Bias was determined 
at each of the 0 levels by 

Bias - (6. - 0.) [3] 
11 

s 

Information was computed from the formula 

1(6.) = S[ 2 /o 2 (* ± ) * 141 

where 8* is the first derivate of the polynomial regression of 0 on 0.' 
Independent Variables 

4 _ 

Eight data sets were analyzed for three levels of item discrimination. The 
characteristics of the three studies and the data sets are summarized iiT Table, 
1. - " / 

Study I: Accurate prior 9 estimate . This study was intended to provide 
"best case" data in order to serve as * benchmark, against which other studies 
could be evaluated. The' "best case" for the BayeBian adaptive test ought to be 
one involving a "perfect" item pool and accurate prior knowledge -about examine 
ees* trait levels. Accurate prior knowledge means'that each examinee's trait 
level was known beforehand and was used as the mean of the Bayes prior distribu- 
tions Under these conditions the only limitations on the information , and- accu- • 
racy of estimate of Owen's procedure are those imposed. by the test length, and 
by the discriminations and guessing parameters of the simulated test items. 
Holding those* variable? constant,- any idiosyncrasies in the behavior bf the test 
scores must be due to the trait level estimation and item difficulty Selection 
procedure. - - . , „ 

Two separate, :and independent test administrations were simulated for each 
of the*3,100 simulees: in Data Set 1, all item discriminations were .80, and in 
Data Set 2, a - 1.60. For each simulee, the Bayes inftial prior distribution 



v - Table 1 
Summary of the Independent Variables 
, in the Three Studies 
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Prior Criterion 
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was normal, with mean 8. and variance 1.0. Thus, at the outset of testing, the 

initial estimate of each simulee's trait? level was accurate.' The adaptive test 
vas allowed to run its normal course, re-estimat±ng 9^ after every it,em response 
and selecting the next item accordingly, until 20 iteits had been administered. 

• 

Study II: Constant ^prior 6 estimate with fixed teat length . Study II rep- 
licated the 20-item fixed t^est lei^gth and constant j* -Values of .80 and 1.60 from 
Study 1; to examinee effects with more highly discriminating, items , Pata Set 5 
used £ - 2^40 for all items, while Data Sets 3 Jlnd, 4 used items w^fh.£ ■ .80 and 
1.60 .as in Study I. In contrast to Study I, th^. three dat^ sets drf Study II used 
the same initial normal prior distribution (mean « 0, variance -1.0) for kll 
simiilees, regardless ef actual trait level. In this study, then, a mor£ typical 
use of the Bayesian adaptive .testing strategy was simulated, i.e., the applica- 
tion to individuals for whom no prior 6 estimates were available prior to test- 
ing; consequently, a group \x ior e distribution was used to seledt the first* 
item to be administered. As in Study I, a fixed-length test of 20 items was 
administered to each simulee. - 

* * ' *' ' ' 

Study III: Constant frrior e eitiijate with Variable test length . In 'Study v 

III, as in Study II, the same initial normal (0,1) prior distribution was as- 
sumed for all simulees/ The difference between the studies was in the °test ter- 
mination criterion. - In Study* IH, testing was terminated for each simulee when- 
ever the posterior variance Vj^fell below .10. This value carreiponds to" the 

"standard error of estijnatfe" .ctflterion of .3162 specified by Urry (JL974) to 
achieve a fidelity coefficient exceeding . 9,5 in a normal C0,1) population" of 
examinees. A -maximum J test length of 30 items was imposed, so that if the poste- 
rior vartanqe -criterion had not* be$n reached within 3d items, testing i#as termi- 
nated. A* for Study II, three levels of item* discrimination--^* .« .80, 1.60, and 
2.40-^were studied in Data Sets. 6, 7, and 8; respectively. / 



Results 



Accurate Prior 6 Estimate 



Bias of the ability estimates for the two data sets of Study I are shown' in 
Figure 1 (numerical values of, bids and information for Data Sets 1 and 2 are T in fc 
Appendix Tabl^ r A). As Figure 1* shows, there was virtually no bias in the abiliy* 
ty estimates for Data Set 2 Ca - 1.6), with a small amount of bias alternating 
between positive bias and negative bias for Data Set 1 ( a^ - .8). The maximum 
amount of bias observed in the data was at 8 « +3, where mean bias was -.10; a 
similar degree of bias was observed at 8 ». -1.8. 
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Figure 1 

Bias as a Function of 0 for Data Sets 1 and £ 
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Figure 2 shoW' information curves for Data Sets 1, and*2. As the results K 
show, the information for Data Set 1 Was relatively flat throughout the 6 iange. 
The maximum information, was observed at 6 ■ -.5, with minimum information at 9 ■ 
+.2. Information ranged between 7 an4 11, with only minor variations across tl^e 
ability range. The information for Data Set 2 was relatively flat, but not as 
flat as that for Data Set 1. There was a spike at 6 * .8 with a secondary 'p^ak 
at 6 * -2. 8,- and overall more variability between 9 levels than for Data Set^ 1. 
In generaf,) there is a slight concave trend to the information values fqr Data j 
Set 2, with the exception- of the spike at 9 » .8. Howevef, the general trend Is 
a relati>jely flat information function for both data sets. *- * 
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A ' Figure 2 

Information as a Function of 6 for Data Sets 1 and 2 
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Constant rtrior 9 Estimate with Fixed Test Length \ 

— * i ' * * / . 

Figure 3 shows the bias in the- 6 estimates' for the data sets of ^Study II at 
each' of the three levels of item ^discrimination (n$imerical values of bias and 
information are in Appendix Table- B) v For all three data sets there is a nega- 
' tive sl9pe Iter" the 'bias curve with low ^ values being overestimated and higher 6 
values bel#g underestimated. In addition, there are some substantial differ- 
ences in ttye bias curves for the three levels of discrimination. Data Set 3 (a 
« .8)- achieved the highest levels of bias of all, three data sets. Very* severe 
r bias w^s, observed fof negative 6 levels and severe bias in the opposite direc- 
tion for positive 0 levels. When item discriminations were increased in Data 
S£t 4/ there was only aslight drop in the positive bias for low 6 levels and a 
more substantial drop in negative 1>ias for the 6 levej.8 above the mean. In- 
creasing- the item discriminations to 2.4 in Data Set v 5 resulted in virtually no 
change in bias for low 6 level but a further decrease in bias for the positive 6 
levels with the range of unbiased ability estimates varying from approximately 6 
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-1 to 8 - +1.5 in Data Set 5. As these results, show, the effect of increasing 
>-item discrimination is t<f> reduce bias somewhat, primarily for high 6 levels. 

tor low e levels ( < -2.0") substantial levels of bias (.20 or mo-jre) were ob- 
r served for tb6 highly discriminating items of Data Set 5. 
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Figure 3| 

Bids as\a Function of 9 for Data Sets -3, 4, and 5 

Data -Set 3 (a =.8) * .* 
Data Set 4 (a = 1.6) 
Data 4 Set 5 (a =2.4)-' % 





Figure 4 shows test information curves ' for the three data sets of Study 2. 
As Figure 4 shows, with the low discriminating items (a - ,8) of Data Set 3, 
test information relatively flat for 8 levels above a1)out 8 ■ -1.5, with a 
decrease in information below that level. As item discrimination is increased, 
the results for. Data Set 4 show the information curve peaking with relatively 
lower information levels for 8 > 1.6 and 8 < -1.5,, and a greater asymmetry in 
the information curye. Finally, *#ien the items of Data Set 5 (ji ■ 2.4) were 
used, the information curve becomes* even more peaked and more variable, with 
high levels of information generally in the range of 8 « +1 to -1, and with in- 
formation dropping off extremely quickly beyond that range. For 8 levels below 



-1, there is little difference in information when item discriminations are in- 
creased from £ ■ 1.6 to £ * 2.4. For 0 levels beloto -1.8, levels of information 
are not increased by JU^reasijng item discriminations. 

# j_ 

Figure 5 

Bias as a Function of 6 for Data Sets 6, 7, and 8 ' * - 
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Constant Prior 9 Estimate Witli Variable Test Length 



Figure 5 shows bias functions for the three data $e,ts of Study III (numeri- 
cal values for bias and information are in Appendix Tables C, HD^ and E). As the 
results show, least bias for low 6' levels was Observed for Dafa Setjft (a ■ .8),i 
while the high 0 levels obtained the highest degree of bias for that data set* ' 
As item discriminations irfcred^ed, bias for low 0 levels increased, while bias 
for the high "0 levels decreased. Extremely high levels of bias were observed., 
for Data Set 7 (a - 1.6) and Data Set 8 (a ■ 2.4) for 0 levels less .than 0 « -2. 
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Figure 6 ^ hows test information function for the variable-length condi- 
tions of Data Sets 6 through 8. The information function Ehajt most approximated 
the horifcont x al and equipr.ecise ideal was achieved by Data Set 6 ( £ - .8), which 
obtained relatively constant levels of information fojr 0 values greater than- 13 * 
-1.5. As item discr i'minatfoh was increased, the Level of information obtained 
for low 0 leveLp decreased, while the level of information obtained for high 0 
levels remained similar. The result of increasing item discrimination was a 
general increase in peake<jness and asymmetry of the test information functions. 

* •» 

figure 6 

Information as a Function X>f 0 for Data Sets 6, 7, and 8 
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Data Set 6 (a = .8) 
'Data. Set 7 (a = 1.6) 
»Data Set 8 (a; 2.4) 




Figurey shows the mean number of items administered for eacKt>fthe 0 lev- 



Is f6f the data sets of Study III (numerical values* a*^ in Appendix Tables C, 
, and E). As expected^ more i1;em^ were needed in'Dita Set 6, which hid^low^ei 
tem discrimination's, than in Dtfta Sets 7 and 8. * The .results sl\ow that^ha Data 
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Set 6, 30 items was generally not sufficient, on the average, for the adaptive 
test to achieve the specified level of posterior variance (.10) for most test 
lengths. The results also show that test length required was an increasing ' 
function of 0 for Data Sets 7 and 8. .While, on the average, the posterior vari- 
ance termination criterion of .10 was achieved with about 8.5 items for low 6 
values in Data Set 7, twice the number of items £17.0) were necessary to achieve 
the same posterior variance termination criterion (on the average) ftfr 6 * +3. 
The same trend was observed for the more highly discriminating items of Data Set 
8. . ] s . 

Figure 7 

Mean Number of Items Administered as a Function of 0 

for Data Sets 6, 7, and 8 . 
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\ Discussion and Conclusions » 

This study used a "perfect" item pool in order to evaluate the performance 
of Owen's Bayesian adaptive testing strategy under ideal conditions. The re- 
sults show that in terms of achieving statistically unbiased measurement and 
measurements of equal precision throughout the range of ability, Owen's adaptive 
testing strategy achieves these desirable goal* only under the Extremely unreal- 

t M 
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istic condition of an accurate prior ability estimate. Of course, in a realis- 
tic testing situation, the examinee's ability is not known beforehand; other- 
wise, testing would not be necessary. Thus^The data of Study 1 serve only as 
an unrealistic baseline condition to which results of other more\realistic test- 
ing conditions can be compared. Even under the unrealistic conditions of Study 
1, however, there was a tendency for increasing item discrimination to result in 
increasing variability in levels of informatics as a function of 6. 

Studies II and III evaluated Owen's Bayesian testing strategy under the 
more realistic testing conditions of a constant prior 0 estimate, with both fix- 
ed and variable test length. The results of Studies 2 and 3 show that this 
adaptive testing strategy does not achieve unbiased measurement or measurements 
of equal precision when a constant prior 6 estimate is used for all examinees, 
regardless of whether test length is fixed or variable. The results show an 
interaction of the termination criterion wj.th the performance of the adaptive 
testing strategy, both in terms of bias and information. 

When a constant test length is used, increasing item discrimination results 
in decreased bias, with a more substantial decrease in bias for high 6 levels. 
When variable termination is used, increasing item discrimination results in 
only slightly decreased bias for high 8 levels, but in increased bias for low 0 
levels, with extremely high levels of bias-- for very low 0 levels. In terms of 
information, the flattest information curves were observed for both termination 
criteria with the least discrir Inating items. As item discrimination was in- 
creased, in both cases the information curve became more peaked and asymmetric, 
with a greater degree of asymmetry, observed for the variabjie-length testing con- 
dition. Results also showed that different mean numbers of items were necessary 
to achieve a fixed posterior variance termination criterion at different levels 
of 8. With moderately and highly discriminating items 0* * 1.6 and £ - 2.4), 
twice the number of items were necessary, on the average, for high 0 levels to 
reach a posterior variance termination criterion of .10 than for low 0 levels. 

Because this study used a perfect item pool in which it^ms of a specified 
discrimination were available at any level of difficulty, the results observed 
in these studies cannot be attributed to deficiencies in the ^em pool, as might 
be the case for the results reported by Gorman ,(1980). Rather, - these results 
are attributable to the effect of the constant prior 0 estimate, as is shown by 
the comparison <of results between Studies II and III and those of Study I. Al- 
though thfc effect of Urry's (1977) correction for regression was not explicitly 
examined in these studies, it is unlikely that it would have the desired effects 
under both the fixed-length and variable-length test condition, since, as indi- 
cated, there was interaction of observed bias with the termination criterion. 

Although r a major purpose of adaptive testing is to provide measurements 
with equal precision/information at all levels of the ability continuum (Weiss, 
1982), results of these analyses show that under the realistic conditions of a 
constant prior 0 estimate, Owen's Bayesian adaptive testing strategy does not 
achieve this desirable goal. Since the test information curves utilize some of 
the same data from which the bias curves were computed, the results for informa- 
tion are in a sense a consequence of the bias in the 8 estimates. The data from 
these three studies show that the bias results from use of a constant prior 8 
estimate. Further research will be necessary to determine whether and to what 
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degree the use of variable prior e estimates will affect the performance of 
Owen's adaptive testing strategy in terms of reducing the bias and, consequent- 
ly, improving the equlprecision of its ability estimates. 
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' Appendix: Supplementary Tables 



V Table A 

Mean and Variance' of 6, Bias and Information, as a Function of 6 

for the Data Sets of Study I 
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Table B 

Mean and Variance of 8, Bias and Information, as a Function of 6 "* 
for the. Data Sets of Study II 





Data 


Set 3 






Data Set 4 






Data 


Set 5 






a 




Infor- 




6 




Infor- 


e 


• 




Infor- 


Mean* Variance 


Bias 


mation 


Mean Variance 


Bias 


mation 


Mean Variance Bi.as 


mation 


-2. 166 


.103 


.83 


2.645 


-2. 308 


.161 


• 69 


.945 


-2.229 


.189 


.77 


.389 


-2.084 


.193 


.72 


1.634 


-2.169 


.162 


.63 


1.273 


-2.097 


.228 


.70 


.544 


-2.017 


.209 


.58 


1.716 


-2.048 


.155 - 


• 55 


1.710 


-2.077 


.163 


.52 


1.130 


-1.896 


.133 


.50 


3.018 


-1.957 


.215 


.44 


1.521 


-1.992 


.114 


.41 


2.204 


-1.696 


..161 


.50 


2.755 . 


-1.958 


.071 


*24 


5.505 


-1.871 


^141 


.33 


2.296 


-1.621 


/144 


.38 


3.364 


' -1.770 


.121 


.23 


3.765 


-1.834 


.062 


.17 


6.442 


-1.463 


.103 


.34 


5.083. 


-1.582 


.080 


• 22 


6.502 


-1 . 588 


. 104 


.21 


4.585 


-1.304 


.191 


.30" 


2.936 


-1.488 


.062 


. 11 


9.410 


-1.486 


.062 


.11 


8.940 


-1.-118 


.188 


.28 


3.167 


-1.335 


.045 


.07 


14. 322. 


-1.332 


.055 


.07 


11.459 


-i.008 


.143 


.19 


4.386 


-1.128 


.084 


.07 


8.364 


-1.147 


.043 


.05 


16.359 


-. 846 


.137 


.15 


4.789 


-.972 


.040 


.03 


18.923 


-.987 ' 


.018 


.01 


.42.925 


-.697 


.104 


.10 


6.554 


-. 723 


.049 


.08 


16.465 


-.781 


.024 


.02 


34.863 


-. 567 


.T46 


.03 


4.819 


-.593 


.058 


.01 


14.682 


-.579 


.033 


.02 


27.112 


-.350 


.125 


.05 


5.775 


-. 432 


.065 


-.03 


1 3. 704 


-. 414 


.035 


-.01 


27.021 


-.215 


.157 


-.02 


4.689 


-.201 


.046 


.00 


20.085 


-. 193 


.'025 


.01 


39. 563 


-.014 


.115 


-.01 


6.491 


. ,-.052 


.048 


-.05 


19.805 


-.009 


.033 


-.01 


31.035 


.188 


.160 


-.01 


4.705 


.155 


.040 


-. 04 


24. 265 


'.192 


.020 


-.01 


52.523 


.380 


• .133 


-. 02 


5.675 


.355 


.051 


-.05 


.19.288 


.404 


.026 


.00 


41.064 


• 517 


.152 


-.08 


4.952 


.544 


.038 


-.06 


26.043 


.612 


.028 


.01 


38.412 


.715 


.143 


-.09 


5.220 


.775 


.049 


-.02 


20.172 


.803 


.022 


.00 


48. 816 


.866 


.147 


-.13 


5.008 


.942 


,038 


-. 06 


25.792 . 


.974 


.023 


-. 03 


46.216 


.959 


.117 


-.24 


6.169 


- 1.132 


.050 


-.07 


19.294 


1.214 


.030 


.01 


34. 756 


1. 197 


."411 


-.20 


* 6. 339 


1.350 


• .059 


-.05 


15.974 


1.396 


.031 


.00 


32. 690 


1.393 


.160 


-.21 


4.260 


1.538 


.074 


-.06 


12.345 


1.591 


.030 


-.01 


32.517 


1.548 


.108 


-.25 


6.075 


1.728 


.054 


-.07 


.16.266 


1.763 


.030 


-.04 


30.984 


1.650 


.174 


-.35 


3.605 * 


1.898 


.056 


-.10 


14.950' 


1.951 


.031 " 


-.05 


28.261 


1.873 


.123 


-.33 


4.840 - 


2. 1 30 


.046 


-.07 


17.189 


2.164 


.026 


-. 04 


31.384 


1.978 


.179 


-. 42 


3.132 


' 2.265 


.050 


-.1^ 


14.785 


2.362 


.027 


-.04 


27.781 


2.144 


.130 


-. 46 


4.028 


2.466 


.045 


-.13 


15. 191 • 


2.538 


.029 


-.06 


23.429 


2.292 


.178 


-.51 


2.721 


2.583 


.058 


-.22 


10.766 


2.709 


.027 


-.09 


22.413 


2.386 


.133 


-.61 


3.335 


2.737 


.045 


-. 26 


.'12.500 


2.847 


.031 


-. 15 


17.049 



-3.0 
-2.8 
-2.6 
-2.4 
-2.2 
-2.0 
■1.8 
■1.6 
-1.4 
■1.2 
-1.0 
-.8 
-.6 
-.4 
-.2 
0.0 
.2 
.4 
.6 
.8 
1.0 
1.2 
1.4 
1.6 
1.8 
2.0 
2.2 
2.4 
2.6 
2.8 
3.0 



Table C 

Mear^and Variance of 9, Bias, Information, 
and Mean And Standard Deviation of Number of 
Items Administered as a Function of 6 
for Data Set 6 
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